Section: New Results
Audio-Source Separation
We addressed the problem of separating audio sources from both static and time-varying convolutive mixtures. We proposed an unsupervised probabilistic framework based on the local complex-Gaussian model combined with non-negative matrix factorization [22]. The time-varying mixing filters are modeled by a continuous temporal stochastic process. This model extended the case of static filters which corresponds to static audio sources. While static filters can be learnt in advance, e.g. [6], time-varying filters cannot and therefore the problem is more complex. We developed a variational expectation-maximization (VEM) algorithm that employs a Kalman smoother to estimate the time-varying mixing matrix, and that jointly estimates the source parameters. In 2017 we extended this method to incorporate the concept of diarization. Indeed, audio sources such as speaking persons do not emit continuously, bet merely take "turns". We formally modeled speech turn-taking within a combined separation and diarization formulation [45], [44]. We also started to investigate the use of the convolutive transfer function for audio-source separation [49], [48], [54].
Websites:
https://team.inria.fr/perception/research/vemove/